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ABSTRACT 

This paper is focused on the computational analysis of 
collective discourse, a collective behavior seen in non- 
expert content contributions in online social media. We 
collect and analyze a wide range of real-world collective 
discourse datasets from movie user reviews to microblogs 
and news headlines to scientific citations. We show that 
all these datasets exhibit diversity of perspective, a prop- 
erty seen in other collective systems and a criterion in 
wise crowds. Our experiments also confirm that the 
network of different perspective co-occurrences exhibits 
the small- world property with high clustering of different 
perspectives. Finally, we show that non-expert contribu- 
tions in collective discourse can be used to answer simple 
questions that are otherwise hard to answer. 

INTRODUCTION 

Collective behavior refers to social processes that are 
not centrally coordinated and emerge spontaneously 
(Blumer 1951). This definition distinguishes collective 
behavior from group behavior in a number of ways: (a) 
collective systems involve limited social interactions, (b) 
membership is fluid, and (c) it generates weak and un- 
conventional norms (Smelser 1963). Collective behavior 
is normally characterized by a complex system (Miller 
& Page 2007). A complex system is a system composed 
of interconnected parts (agents, processes, etc.) that as 
a whole exhibit one or more properties called emergent 
behavior. The emergent behavior, which is not obvi- 
ous from the properties of the individuals, is called to 
be nonlinear (not derivable from the summations of the 
activity of individual components) . 

Nonlinear behavior has been widely observed in nature 
in the past. Gordon (1999) explains how harvester ants 
achieve task allocation without any central control and 
only by means of continual adjustment. Moreover she 
argues that the cooperative behavior in the ant colony 
merely results from local interactions between individ- 
ual ants and not a central controller. For instance, in 
ant colonies individual members react to local stimuli 
(in the form of chemical scent) depending only on their 
local environment. In the absence of a centralized de- 
cision maker, ant colonies exhibit complex behavior to 
solve geometric problems like shortest paths to food or 
maximum distance from all colony entrances to dispose 
of dead bodies. 

Self-organized behavior is not specific to ants. Schools 



of fish, flocks of birds, herd of ungulate mammals 
are other examples of complex systems among ani- 
mal groups (Fisher 2009). Similarly pedestrians on a 
crowded sidewalk exhibit self-organization that leads to 
forming lanes along which walkers move in the same di- 
rections (Boccara 2010). It is argued that all examples 
of complex systems exhibit common characteristics: 

1 . They are composed of a large number of inter-connected 
parts (i.e., agents) 

2. The system is self-organized in that there is not central 
controller. 



3. They exhibit emergent behavior: 
the group but not observable 



properties seen in 



In social sciences, a lot of work has been done on collec- 
tive systems and their properties (Hong & Page 2009). 
However, there is only little work that studies a collective 
system in which individual members collectively describe 
an event or an object. In our work, we focus on the 
computational analysis of collective discourse, a collec- 
tive behavior seen in interactive content contribution in 
online social media (Qazvinian & Radev 2011). 

In this paper, we show that collective discourse ex- 
hibits diversity of opinions, a property that is defined 
by (Surowiecki 2004) as a necessary criterion for wise 
crowds. 

BACKGROUND 

Previously, it has been argued that diversity is essential 
in intelligent collective decision-making. Page (2007) ar- 
gues that the diversity of people and groups, which en- 
able new perspectives, leads to better decision making. 
He finds that the diversity of perspectives in a collec- 
tive system is associated with higher rates of innovation 
and can enhance the capacity for finding solutions to 
complex problems. Similarly, Hong & Page (2004) show 
that a random group of intelligent problem solvers can 
benefit from diversity and outperform a group of the best 
problem solvers. 

Prior work has also studied the diversity of perspec- 
tives in content contribution and text summarization. 
In prior work on evaluating independent contributions 
in content generation, Voorhees (1998) studied IR sys- 
tems and showed that relevance judgments vary sig- 
nificantly between humans but relative rankings are 
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more stable across annotators. Similarly, van Hal- 
teren & Teufel (2004) designed an experiment, which 
asked 40 Dutch students and 10 NLP researchers to 
summarize a BBC news report, resulting in 50 differ- 
ent summaries. They calculated the Kappa statistic 
(Carletta 1996, Krippendorff 1980) and observed high 
inter-judge agreement, suggesting that the task of atomic 
semantic unit (factoid) extraction can be robustly per- 
formed in naturally occurring text. 

The diversity of perspectives and the unprecedented 
growth of the factoid inventory have influenced other 
research areas in Natural Language Processing such as 
text summarization and paraphrase generation. Summa- 
rization evaluations are performed by assessing the infor- 
mation content with respect to salience and diversity in 
the summaries that are generated automatically (Sparck- 
Jones 1999, van Haheren & Teufel 2003, Nenkova & 
Passonneau 2004). 

Leveraging the diverse range of perspectives has also 
played a critical role in developing new paraphrase gener- 
ation systems by providing massive amounts of data that 
is easily collectable. For instance, Chen & Dolan (2011) 
performed a study and collected highly parallel data, 
used for training paraphrase generation systems from 
descriptions that participants wrote for video segments 
from YouTube. Such parallel corpora of document pairs 
that represent the same semantic information in differ- 
ent languages can be extracted from user contributions in 
Wikipedia and be used for learning translations of words 
and phrases (Yih, Toutanova, Piatt & Meek 2011). 

COLLECTIVE DISCOURSE 

With the growth of Web 2.0, millions of individuals in- 
volve in collective discourse. They participate in online 
discussions, share their opinions, and generate content 
about the same artifacts, objects, and news events in 
Web portals like amazon.com, epinions.com, imdb.com 
and so forth. This massive amount of text is mainly 
written on the Web by non-expert individuals with dif- 
ferent perspectives, and yet exhibits accurate knowledge 
as a whole. 

In social media, collective discourse is often a collective 
reaction to an event. A collective reaction to a well- 
defined subject emerges in response to an event (a movie 
release, a breaking story, a newly published paper) in 
the form of independent writings (movie reviews, news 
headlines, citation sentences) by many individuals. To 
analyze collective discourse, we perform our analysis on 
a wide range of real-world datasets. 

Corpus Construction 

An essential step and an important contribution in our 
work is gathering a comprehensive corpus of datasets on 
collective discourse. We focus on social media consisting 
of independent contributions of many individuals. Fur- 
thermore, we focus on topics corresponding to specific 
items and events as opposed to issues that are evolving 



Datasct 


=i^clusters 


average #docs 


Movie reviews 


100 


965 


Microblogs 


15 


110 


News headlines 


25 


55 


Citations 


25 


52 



Table 1. Number and size of collective discourse datasets 
studied in this paper. 

and diffuse either in time or scope such as the economy 
or education. Table [1] lists the set of collective discourse 
corpora that we have analyzed as well as the number of 
datasets and average number of documents in each of 
them. In the following, we further explain each of these 
collective discourse corpora. 

Movie Reviews 

The first collective discourse that we are interested in an- 
alyzing is the set of reviews that non-expert users write 
about a movie. The set of online reviews about an object 
is a perfect case of collective human behavior. Upon its 
release, each movie, book, or product receives hundreds 
and thousands of online reviews from non-expert Web 
users. These reviews, while discussing the same object, 
focus on different aspects of the object. For instance, in 
movie reviews, some reviewers solely focus on a few fa- 
mous actors, while some discuss other aspects like music 
or screenplay. 

To study collective discourse in movie reviews, we col- 
lected all the user reviews for 100 randomly selected 
movies from the top 250 movies list in the Internet Movie 
Database (IMDBJJ. For each of these 100 movies, we also 
obtained plot keywords provided on the IMDB website. 
Our collected corpora consist of more than 96,500 user 
reviews posted for movies from 19 different genres. 

The following excerpts are extracted from user reviews 
for the movie Pulp Fiction, and show how non-expert 
reviewers focus on different aspects of the movie. 

"... starred by many well-known Actors, such as: John 
Travolta, Samuel L. Jackson, Uma Thurman, Bruce 
Willis and many. Directed by Quentin Tarantino, the 
eccentric Director ..." 

"... Pulp fiction was nominated for seven academy 
awards and won only one for screen writing ..." 

"... Shocking, intelligent, exciting, hilarious and oddly 
though-provoking. Best hit: Jackson's Bible quote ..." 

Microblogs 

The second type of collective discourse that we study 
in our work is the set of tweets written about a news 
story. In addition to other advantages, using Twitter 
as a corpus of collective discourse does present unusual 

^http: / /www. imdb.com/chart/top 
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challenges. In Twitter, posts are limited to 140 charac- 
ters and often contain information in an unusually com- 
pressed form. 

First, we use the set of tweets collected by (Qazvinian, 
Rosengren, Radev & Mei 2011) about Sarah Palin's di- 
vorce rumor that was popular during the 2008 presiden- 
tial election campaigns. This dataset contains tweets 
that are about this story and yet discuss it from differ- 
ent angles. For example, the following tweets are ex- 
tracted from this dataset and reveal various facts about 
the story. One aspect is that a blogger has started the 
spread, and is threatened with libel suit. Another aspect 
is that the rumor has been debunked on Facebook. 

"Palins lawyer threatens divorce blogger with libel suit, 
gives her the option of receiving the summons at her 
resid... http://ow.ly/15JD06." 

"@jose3030 Palin divorce is supposedly debunked on 
Facebook, but I think they are just spinning it, until 
they can announce it. " 

"RT @mediaite: Sarah Palin uses Facebook to deny un- 
sourced divorce rumors - http://bit.ly/i4Xy6h CH." 

As our second Microblog dataset, we collected the tweets 
that talk about the cancellation rumors of 14 TV shows 
in August of 2011. For instance, one of our collected 
datasets is about the rumor that Charlie Sheen might 
go back to the TV show Two and a Half Men. 

"Charlie Sheen Claims 'Discussions' About Returning 
to 'Two and a Half Men': In Boston for his national 
tour, C. 



http://bit.ly/hIbOWfy " 



Charlie Sheen "Two And A Half Men" Return Not Hap- 
pening: Report http://dlvr. it/LCThd^ " 



P. Each citation to P may or may not discuss one of P's 
contributions. 

For example, the following set of citations to Eisner's 
work (Eisner 1996) illustrate the set of factoid about 
this paper and suggest that different authors who cite 
a particular paper may discuss different contributions 
(fatoids) of that paper. 

In the context of DPs, this edge based factorization 
method was proposed by (Eisner, 1996). 

Eisner (1996) gave a generative model with a cubic 
parsing algorithm based on an edge factorization of 
trees. 

Eisner (1996) proposed an 0{n^) parsing algorithm for 
PDC. 

If the parse has to be projective, Eisner's bottom-up- 
span algorithm (Eisner, 1996) can be used for the 
search. 



Other Collective Discourse Datasets 

The study of collective discourse helps us understand 
new aspects of an object that are hard to identify with a 
single authoritative view. Collective discourse examples 
are not limited to the datasets that we have collected. 
For instance, studying a complete set of introductions 
about PageRank enables us to learn about its impor- 
tant aspects such as the algorithm, the damping factor, 
and the Power method, as well as aspects that are less 
known such as its use in 1940s (Franceschet 2010). Sim- 
ilar examples exist in different TV show synopsis, book 
descriptions, story narrations and many more. 



News Headlines 

Another collective discourse is seen when a story breaks 
and various news agencies write stories about it. These 
stories all talk about the story, but view it from different 
perspectives. 

We collected 25 news clusters from Google News2. Each 
cluster consists of a set of unique headlines about the 
same story, written by different sources. The following 
example shows 3 headlines in our datasets that are about 
hurricane Bill and its damage in Maine. 

"Hurricane Bill sweeps several people into ocean. " 

"7-year-old girl swept away by Bill wave dies after res- 
cue. " 

"Maine ranger: wave viewers didn't heed warnings. " 



Citation Sentences 

The final collective discourse example that we study is 
the set of citation sentences that different scholars write 
about a specific paper. A citation sentence to an article, 
P, is a sentence that appears in the literature and cites 



DIVERSE PERSPECTIVES 

In social sciences, a perspective is defined as a map from 
reality to one's internal language, which is used to de- 
scribe millions of objects, events, or situations (Page 
2007) . Each word in the internal language refers to a con- 
cept (factoid) that can be expressed by means of a spoken 
language using various words or phrases (nuggets) . More 
accurately, a factoid is an atomic semantic unit, which 
can be represented using different phrasal information 
units or nuggets (Qazvinian & Radev 2011, van Halteren 
& Teufel 2003). 

For instance, the "death of a 7-year-old girl" and "kid, 7, 
dies" are the same factoids about the hurricane Bill story 
but represented differently (using different nuggets). 
Sweeping several people and warnings before the hurri- 
cane are some other factoids in the set of headlines about 
this story. These factoids show that different news re- 
porters focus on different aspects of the hurricane story. 
Similarly, "Sarah Palin using Facebook to debunk the 
rumor" is a factoid in the microblog dataset, and "a 
bible quote mentioned by Samuel Jackson" is a factoid 
that appears in the movie reviews about Pulp Fiction. 
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-/-/-■("iTT'i^ri'i" c: 




FB 


414 


debunked on Facebook 


"Fn A TV T~r~r \ r 

FAMILY 


106 


family values 


At a 7' A 

ALASKA 


87 


Alaska report's evidence 


QUIT 


72 


resignation and divorce 


A VVX TRQ 
Ar r Alixo 


Oo 


affairs 


GAY 


36 


gay marriage ban 


CAMP 


36 


her camp denies the rumor 


MONTANA 


33 


moving to Montana 


LIBEL 


24 


libel suit against the rumor 


BLOG 


19 


blogger who started the rumor 



Table 2. Different factoids extracted from the Palin 
dataset with the number of tweets that mention them, 
and short descriptions. 



Annotations 

For each collective discourse dataset, we construct the 
set of factoids that represent various aspects of a story 
or a movie or different contributions of a paper. 

For the microblogs dataset, we asked two annotators to 
go over all the tweets and identify a set of factoids that 
represent different aspects of each rumor. We then man- 
ually marked each tweet with the factoid that is relevant 
to the tweet. Each factoid is usually covered by a num- 
ber of tweets, and each tweet covers one or more factoids. 
However, we did not observe any tweets that cover more 
than 2 factoids in our datasets. The small number of 
factoids covered by each tweet is most likely due to the 
length limit enforced by Twitter on each post. 

Table[5]lists the factoids extracted from the Sarah Palin. s 
divorce rumor dataset. This table shows that the 414 
tweets discuss how "Facebook is used to debunk the 
rumor," while the "libel suit against the blogger who 
started the rumor" is only mentioned in 24 tweets of the 
total 789 tweets. 

To calculate the inter-judge agreement, we annotated 
100 microblog instances on Sarah Palin twice, and cal- 
culated the statistic as 

_ Pr{a) - Pr{e) 

^ 1 - Pr{e) 

where Pr{a) is the relative observed agreement among 
the two annotators on the 10 factoids from Table [5J 
and Pr{e) is the probability that annotators agree by 
chance if each annotator is randomly assigning cate- 
gories. Based on this formulation, we reach a value of 
0.913 in K, and 93% agreement between the two annota- 
tors. 

We also annotated the set of citations and news head- 
lines in the same fashion. Particularly, we asked two an- 
notators to extract factoids for each of the 25 news and 
citation clusters, and then match individual documents 
(headline or citation sentence) with relevant factoids. 
Previously we have shown high agreement in human 
judgments for extracting factoids from these datasets 
(k w 0.8) (Qazvinian & Radev 2011). 



Dataset 


Number of factoids 


Movie reviews 


131.31 ± 52.67 


Microblogs 


2.93 ±2.05 


News headlines 


7.48 ± 4.02 


Citations 


5.48 ± 1.96 



Table 3. Average number of factoids in various collective 
discourse corpora. 

For the movie review clusters, we downloaded the list of 
cast names as well as the list of plot keywords provided 
for each movie by IMDB, as the set of factoids about the 
movi43. 

Table [3] lists the average number of factoids for each col- 
lective discourse corpus. For the Movie reviews, there 
is an average of 131 factoids per movie, and for cita- 
tions, headlines and microblogs, our annotators identify 
an average of 5, 7, and 3 factoids respectively. 

Diversity 

Surowiecki (2004) defines 4 criteria for a crowd to be 
wise: (1) people in the crowd should have diverse knowl- 
edge of facts (diversity); (2) people should act indepen- 
dently and their opinion should not be affected by that of 
others (independence); (3) people should have access to 
local knowledge (decentralization); and (4) a mechanism 
should exist to turn individual judgments into collective 
intelligence (aggregation) . 

Here, we present evidence that the individuals who en- 
gage in collective discourse have diverse perspectives and 
interpret things differently. 

Novelty and Redundancy 

To investigate the diversity of perspectives, we look at 
the frequency distribution of various factoids in differ- 
ent corpora by extracting the number of individuals that 
mention each factoid, /, in the annotated clusters. Fig- 
ure [1] shows the log- log scale cumulative probability dis- 
tribution for these counts (i.e., the probability that a 
factoid will be mentioned by at least c different people) 
in all of our collective discourse corpora. This figure sug- 
gests that factoid mention frequencies exhibit a highly 
skewed distribution with many factoids mentioned only 
once and a very few factoids mentioned by a large num- 
ber of people. For instance, in the Pulp Fiction example, 
"Bruce Willis" and "Quentin Tarantino" are very pop- 
ular factoids and most reviewers mention them, while 
"Rene Beard", "Frank Whaley" (two other actors), or 
"Jackson's bible quote" are among many factoids that 
are not as frequently mentioned. 

SMALL- WORLD OF FACTOIDS 

^We admit that the set of cast names and plot keywords 
provided by IMDB does not include all the factoids about 
the movie. However, since creating gold standard data from 
complete user reviews is fairly arduous, and we did not pursue 
manual annotations for movies. 
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Movie Reviews Citations 




Figure 1. The cumulative probability distribution for the 
frequency of factoids (i.e., the probability that a factoid 
will be mentioned in c different summaries) across in each 
corpus. 

Recent research has shown that a wide range of natural 
graphs such as the biological networks (Ravasz, Somera, 
Mongru, Oltvai & Barabasi 2002), food webs (Montoya 
& Sole 2002), electronic circuits (Ferrer i Cancho, Janssen 
& Sole 2001), brain neurons (Bassett & BuUmore 2006), 
and human languages (Ferrer i Cancho & Sole 2001) ex- 
hibit the small- world property. This common character- 
istic can be detected from two basic statistical properties: 
the clustering coefficient C , and the average shortest 
path length £. 

The clustering coefficient of a graph measures the num- 
ber of closed triangles in the graph. The clustering coef- 
ficient describes how likely it is that two neighbors of a 
vertex are connected (Newman 2003). Watts & Strogatz 
(1998) define the clustering coefficient as the average of 
the local clustering values for each vertex. 

En 

n 

The local clustering coefficient, Ci for the ith vertex is the 
number of triangles connected to vertex i divided by the 
total possible number of triangles connected to vertex. 
Watts & Strogatz (1998) show that small-world networks 
are highly clustered and obtain relatively short paths 
(i.e., £ is small). These networks are usually studied in 
contrast with random networks in which both i and C 
obtain small values. 

To understand the relationship between various aspects 
of a story or subject and to study the relationship be- 
tween different individuals' contributions, we analyze the 
network of factoids. 

For each dataset, we build a network in which nodes rep- 
resent different factoids and there is an edge between two 



C ^random 




^random 


0.814 0.072 


1.613 


2.627 



Table 4. Average clustering coefficient (C) and the aver- 
age shortest path length (() in the networks of the col- 
lective discourse corpora and the corresponding random 
networks. 



nodes if the corresponding factoids have been mentioned 
together in at least 10 documents. Using these networks, 
we would like to investigate whether there are many fac- 
toid pairs that co-occur in individual user contributions, 
and whether there are communities of factoids that co- 
occur more frequently than others. For each network, 
we use the same number of nodes and edges and gen- 
erate a random network using the Erdos-Renyi model, 
which sets an edge between each pair of nodes with equal 
probability, independently of the other edges (Erdos & 
Renyi 1960). 

Table|3]lists the average clustering coefficient (C) and the 
average shortest path length {£) in the networks built 
using factoid co-occurrences. This table confirms that 
the clustering coefficient in the factoid networks is gen- 
erally significantly greater than random networks of the 
same size. Moreover, this table confirms that the average 
shortest paths in the random networks arc small. 

Ferrer i Cancho & Sole (2001) and Motter, de Moura, 
Lai & Dasgupta (2002) perform similar experiments and 
show that the word co-occurrence and word synonymy 
networks have small-world properties. However, we be- 
lieve that this is the first work that shows the small- 
world effect in human language at the factoid level (net- 
work of concepts). This finding further justifies the con- 
clusion made by (Motter et al. 2002), who emphasize 
that human memory is associative (i.e., information is 
retrieved by connecting similar concepts) in which the 
small-world property of the network maximizes the re- 
trieval efficiency. More precisely, high clustering of the 
network causes similar pieces of information to be stored 
together, and low shortest paths make very different 
pieces of information to be separated only by a few links, 
guaranteeing a fast search. 

WISE CROWDS 

Previous work has studied crowd wisdom in online con- 
tent contributions. Wikipedia for instance, has been 
named as an example of a successful collective effort. 
Kittur, Chi, Pendleton, Suh & Mytkowicz (2007) study 
user contributions in Wikipedia and suggest that the 
main workload is Wikipedia is driven by "common" users 
and that admin influence has dramatically decreased 
over years. Furthermore, Kittur & Kraut (2008) show 
that adding more editors to an article results in higher 
article quality when appropriate coordination techniques 
are used. In this section, we present some evidence of 
wisdom in collective discourse that is not achievable from 
individuals or from smaller groups. In our experiments. 
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Genre 




1 clc VcLllCt: 


i 


action 


U.z4i 


1 


o 
z 


sci-fi 




1 


3 


war 


0.105 





4 


fantasy 


0.087 


1 





history 


n nan 


u 


6 


animation 


0.062 





7 


adventure 


0.051 


1 


8 


romance 


0.039 





9 


drama 


0.025 





10 


family 


0.023 






Table 5. Top 10 genres extracted for the movie "Avatar" 
from user reviews. 

we try to answer a simple question about a movie just 
by using its set of reviews. 

The question we try to answer is to find each movie's 
genre. As the gold standard, we collected the genres for 
each of the 100 movies for which we had user reviews. 
Each movie is associated with a few (3-4) genres out of 
a total of 19 genre names. 

To extract the list of possible genres for a movie, we 
match all the genre names against the reviews and rank 
them based on their relative frequency. More partic- 
ularly, the score of each genre, g for a movie with N 
reviews {Di...Dn) is calculated as 

Di mentions g 

N 

Table [5] lists the top 10 genres retrieved for the movie 
"Avatar" form user reviews together with the score of 
each genre and the relevance according to the gold stan- 
dard that we obtained from IMDB. This table shows an 
example in which all the 4 genre names for Avatar are 
among the 7 most frequently genres mentioned by non- 
expert users. 

To evaluate the ranked list of retrieved genre names, we 
use Mean Average Precision and F-score. The Mean 
Average Precision (MAP) for a set of queries (movie 
names in our experiments) is calculated as the mean of 
the average precision scores for each query. The average 
precision for each query, q is calculated as 

j^p X^fcLi Precision@k x rel{k) 
' number of relevant genres 

where rel{k) obtains a value of 1 if the fcth retrieved 
genre is correct and otherwise. We also calculate i^^=3 
when top 3 genres from the top of the ranked list are 
retrieved as relevant. Table |6] lists the results of this 
experiment. 

To see how useful the set of reviews is for this particular 
task, we compare it with ranking genre names randomly 



Method MAP 95% C.I. F^^s 95% CX 
Reviews 0J98 [0.657 , 0.740J 0350 [0.499 , 0.600J 
Random 0.260 [0.229 , 0.290] 0.140 [0.101 , 0.179] 



Table 6. Mean Average Precision and F-score for genre 
extraction from a set of reviews (C.I.: Confidence Inter- 
val). 

and repeating the experiment. As Table 6 shows, using 
simple mention frequency measures provides significant 
improvements over guessing the genre randomly. 

The numbers in Table [B] are calculated using all the user 
reviews collected for each movie (ranging from a few hun- 
dreds to a few thousands per movie). Here, we would 
like to investigate if having more reviews will give us 
a more accurate estimate of the genres associated with 
each movie. 

Figure [2] plots the 95% confidence interval of MAP ver- 
sus the number of randomly selected user reviews used 
to rank the genres for each movie. This figure, which 
is plotted on a semi-log scale, shows that the quality of 
ranking grows rapidly by the 100th randomly selected 
review and exhibits asymptotic behavior when more re- 
views are visited. 



0.8 




0.35 



10° io' io' io' 

Number of Reviews 

Figure 2. Mean Average Precision (MAP) versus the 
number of reviews used to extract each movie genre. (The 
shaded area shows 95% confidence interval for each MAP 
result) 



CONCLUSION AND FUTURE WORK 

We studied collective discourse and investigated diverse 
perspectives when a number of non-expert Web users 
engage in collective behavior and generate content on 
the Web. We show that the set of people who discuss 
the same story or subject have diverse perspectives, in- 
troducing new aspects that have not been previously dis- 
cussed by others. 
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We analyzed a wide range of collective discoiirse exam- 
ples, from movie reviews and news stories to scientific 
citations and microblogs. To the best of our knowl- 
edge this is the first work that studies the diversity in 
perspectives, and the small world-effect in factoid co- 
occurrences. We also perform an experiment that pro- 
vides some evidence of collective intelligence in the col- 
lectively written set of reviews by non-expert users. 

The ultimate goal of this work is to develop mod- 
els of collective discourse. The models would be in- 
formed by empirical analysis of varied and large-scale 
datascts and would address various aspects of collective 
discourse: motivation behind continuous contributions, 
heterogeneity and diversity in perspectives, and collec- 
tive intelligence from collaboration. By formulating sim- 
ple stochastic models of individual and group behavior, 
we may be able to predict phenomena on the macro level 
of discourse. We will be trying to address these questions 
by developing state of the art technologies in computa- 
tional linguistics, network science and social theories of 
mass communications. 
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